UMass at TREC 2004: Novelty and HARD
نویسندگان
چکیده
For the TREC 2004 Novelty track, UMass participated in all four tasks. Although finding relevant sentences was harder this year than last, we continue to show marked improvements over the baseline of calling all sentences relevant, with a variant of tfidf being the most successful approach. We achieve 5–9% improvements over the baseline in locating novel sentences, primarily by looking at the similarity of a sentence to earlier sentences and focusing on named entities. For the High Accuracy Retrieval from Documents (HARD) track, we investigated the use of clarification forms, fixedand variable-length passage retrieval, and the use of metadata. Clarification form results indicate that passage level feedback can provide improvements comparable to user supplied related-text for document evaluation and outperforms related-text for passage evaluation. Document retrieval methods without a query expansion component show the most gains from related-text. We also found that displaying the top passages for feedback outperformed displaying centroid passages. Named entity feedback resulted in mixed performance. Our primary findings for passage retrieval are that document retrieval methods performed better than passage retrieval methods on the passage evaluation metric of binary preference at 12,000 characters, and that clarification forms improved passage retrieval for every retrieval method explored. We found no benefit to using variable-length passages over fixed-length passages for this corpus. Our use of geography and genre metadata resulted in no significant changes in retrieval performance.
منابع مشابه
UMass at TREC 2002: Cross Language and Novelty Tracks
The University of Massachusetts participated in the cross-language and novelty tracks this year. The cross-language submission was characterized by combination of evidence to merge results from two different retrieval engines and a variety of different resources – stemmers, dictionaries, machine translation, and an acronym database. We found that proper names were extremely important in this ye...
متن کاملNLPR at TREC 2005: HARD Experiments
1 Overview It is the third time that Chinese Information Processing Group of NLPR takes part in TREC. In the past, we participated in Novelty track and Robust track, in which we had evaluated our two key notions: Window-based Retrieval Algorithm and Result Emerging Strategy [1][2]. This year we focus on investigating the significance of relevance feedback, so HARD track is our best choice. HARD...
متن کاملImproved Feature Selection and Redundance Computing - THUIR at TREC 2004 Novelty Track
This is the third years that Tsinghua University Information Retrieval Group (THUIR) participates in Novelty task of TREC. Our research on this year’s novelty track mainly focused on four aspects: (1) text feature selection and reduction; (2) improved sentence classification in finding relevant information; (3)efficient sentence redundancy computing; (4) effective result filtering. All experime...
متن کاملGraph-Based Text Representation For Novelty Detection
We discuss several feature sets for novelty detection at the sentence level, using the data and procedure established in task 2 of the TREC 2004 novelty track. In particular, we investigate feature sets derived from graph representations of sentences and sets of sentences. We show that a highly connected graph produced by using sentence-level term distances and pointwise mutual information can ...
متن کاملISI Novelty Track System for TREC 2004
We describe our system developed at ISI for the Novelty track at TREC 2004. The system’s two modules recognize relevant event and opinion sentences respectively. We focused mainly on recognizing relevant opinion sentences using various opinionbearing word lists. Of our 5 runs submitted for task 1, the best run provided an F-score of 0.390 (precision 0.30 and recall 0.71).
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004